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Decision-support  technologies  are  founded  on  the  paradigm  that  direct 
judgments  are  less  reliable  and  less  valid  than  synthetic  Inferences  produced 
from  more  fragmentary  judgments.  Moreover,  certain  types  of  fragments  are 
normally  assumed  to  be  more  valid  than  others.  In  particular,  judgments 
about  the  likelihood  of  a  certain  state  of  affairs  given  a  particular  set 
d?5?  (diagnostic  Inferences)  are  routinely  fabricated  from  judgments  about 
the  likelihood  of  that  data  given  various  states  of  affairs  (causal  Inferences),- 
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and  not  vice  versa.  This  study  was  designed  to  test  the  benefits  of  causal 
synthesis  schemes  by  comparing  the  validity  of  causal  and  diagnostic  judgments 
against  Aground- truttr  standards 

The  results  demonstrate  thafl^he  validity  of  causal  and  diagnostic 
inferences  are  strikingly  similar;  direct  diagnostic  estimates  of  conditional 
probabilities  were  found  to  be  as  accurate  as  their  synthetic  counterparts 
deduced  from  causal  judgments.  The  reverse  is  equally  true.  Moreover,  these 
accuracies  were  found  to  be  roughly  equal  for  each  causal  category  tested. 
Thus,  if  the  validity  of  judgments  produced  by  a  given  mode  of  reasoning  is 
a  measure  of  whether  it  matches  the  format  of  human  semantic  memory,  then 
neither  one  of  the  causal  or  diagnostic  schema  is  a  more  universal  or  more 
natural  format  for  encoding  knowledge  about  common,  everyday  experiences. 

These  finding  imply  that  one  should  approach  the  'divide  and  conquer' 
ritual  with  caution;  not  every  division  leads  to  a  conquest,  even  when  the 
atoms  are  cast  in  causal  phrasings.  Dogmatic  decompositions  performed  at 
the  expense  of  conceptual  simplicity  may  lead  to  inferences  of  lower  quality 
than  those  of  direct,  unaided  judgments. 
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On  the  Value  of  Synthetic  Judgments 


The  objective  of  this  study  is  to  investigate  empirically  the  conditions 
under  which  synthetic  conclusions  produced  from  'fragmentary'  judgments  are 
more  valid  than  direct,  unaided  inferences.  We  define  'validity'  as  the 
proximity  between  an  assertion  and  the  actual  experience  upon  which  it  is 
based.  The  implicit  assumption  that  synthetic  conclusions  are  more  valid  than 
their  direct  counterparts  is  the  basis  for  advocating  the  usefulness  of  all 
decision  aiding  technologies.  Quoting  Slovic,  Flschhoff,  and  Lichtenstein 
(1977): 

Most  of  these  decision  aids  rely  on  the  principle  of  divide  and 
conquer.  This  "decomposition"  approach  Is  a  constructive  response  to 
the  problem  of  cognitive  overload.  The  decision  aid  fractionates  the 
total  problem  into  a  series  of  structurally  related  parts,  and  the 
decision  maker  is  asked  to  make  subjective  assessments  for  only  the 
smallest  components.  Such  assessments  are  presumably  simpler  and  more 
manageable  than  assessing  more  global  entitles.  Research  showing  that 
decomposition  improves  judgment  has  been  reported  by  Armstrong,  Denniston 
&  Gordon  (1975),  Gettys  et  al.  (1973),  and  by  [Edwards,  Phillips,  Hays, 
and  Goodman  (1968)]. 

Critics  of  the  decomposition  approach  would  argue  that  many  of  the 
aids  require  assessments  of  quantities  the  decision  maker  has  never 
thought  about,  and  that  these  apparently  simple  assessments  may  be 
psychologically  more  complex  than  the  original  decision.  In  some 
situations,  people  may  really  know  what  they  want  to  do  better  than 
they  know  how  to  assess  the  Inputs  required  for  the  decision  aid 
(p.  17-18). 
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A  closer  look  at  decision  aiding  techniques  reveals  that  the  structuring 
procedures  used  fall  Into  two  categories:  cascading  and  Inversion.  Cascading 
entails  the  chaining  of  a  sequence  of  local  judgments  to  produce  the  global 
inference.  Inversion  involves  converting  the  direction  of  certain  relations 
to  a  format  more  compatible  with  the  decision  maker's  conceptualization  of  the 
environment.  A  typical  example  of  cascading  would  involve  inferring  the 
consequence  of  a  long  sequence  of  actions.  That  Is  normally  done  by  separately 
considering  the  effect  of  each  Individual  action  in  the  chain.  Similarly,  the 
aggregation  of  pieces  of  evidence  In  a  multi-stage  Inferencing  task  would  be  an 
instance  of  cascading.  For  example,  in  the  practice  of  decision  analysis,  the 
quality  of  actions  are  invariably  Inferred  from  judgments  about  the  desirability 
of  the  actions' consequences  cascaded  by  judgments  about  the  likelihood  of  those 
consequences.  Decision  analysts  never  accept  direct  judgments  about  preferences 
on  actions. 

The  most  prevalent  example  of  Inversion  is  the  insistance  of  decision 
analysts  that  information  connecting  evidential  data  with  the  hypothesis  be  cast 
in  causal  phrasings.  Judgments  about  the  likelihood  of  a  certain  hypothesis 
given  a  particular  set  of  data  (diagnostic  inferences)  are  routinely  fabricated 
from  judgments  about  the  likelihood  of  that  data  given  various  states  of  affairs 
(causal  inferences),  and  not  vice  versa  (Edwards  et  al.,  1968;  Howard,  1968; 
Raiffa,  1969;  Tribus,  1969). 

The  experiments  of  Armstrong,  Denniston,  and  Gordon  (1975)  and  Gettys, 
Michel,  and  Steiger  (1973)  were  directed  toward  verifying  the  benefit  of 
cascading  inferences.  Armstrong  et  al.  had  subjects  answer  almanac-type 
questions;  they  tried  to  estimate  some  quantity  (e.g.,  the  number  of  pounds 
of  tobacco  processed  in  the  U.S.  in  1972)  about  which  they  had  little  or 
no  a  priori  knowledge.  Some  subjects  attempted  to  answer  the  overall  'global' 
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question,  while  others  were  Instructed  how  to  break  the  problem  down  Into 
smaller  subproblems.  They  found  that  decomposing  the  global  problem  Into 
subproblems  was  helpful,  especially  on  those  problems  where  the  subject  knew 
practically  nothing  beforehand.  Gettys  et  al.  have  tested  the  validity  of 
likelihood  estimates  In  the  context  of  multi-staged  hierarchical  inference 
tasks.  They  compared  posterior  odds  mentally  assessed  by  subjects  with  the 
posterior  odds  calculated  from  Bayes1  theorem  and  based  on  the  actual  histo¬ 
grams  displayed  to  the  subjects.  Oddly  enough.  In  their  first  experiment, 
which  involved  relations  among  height,  scores,  gender,  and  majors  of  students, 
direct  mental  assessments  proved  almost  as  accurate  as  those  computed  from  the 
optimum  model.  Only  In  their  second  experiment  involving  a  version  of  the  urn 
problem  did  superiority  of  synthetic  judgments  surface.  The  results  of  the 
Armstrong  et  al.  and  Gettys  et  al .  studies  suggest  that  synthetic  cascading 
inferences  may  Indeed  be  a  useful  device  in  some  Instances.  However,  the 
question  of  how  fine  a  division  to  employ  should  be  approached  with  caution. 
Whereas  the  child  who  is  learning  arithmetic  normally  views  multiplication 
as  a  sequence  of  additional  operations,  such  a  view  may  be  detrimental  to  the 
more  advanced  student.  Similarly,  the  pianist  ought  not  to  view  his  movements 
as  being  composed  of  individual  muscular  activations,  but  rather  as  a  pattern  of 
global  entities  such  as  scales,  chords,  arpegios,  and  the  like.  In  the  same 
vein,  one  may  argue  that  as  the  decision  maker  becomes  more  familiar  with  the 
task  environment  he  may  achieve  a  state  where  unaided  global  inferences  become 
more  valid  than  their  synthetic  counterparts. 

This  paper  focuses  on  the  Issue  of  causal /diagnostic  inversion.  The  impetus 
for  the  hypothesis  that  causal  judgments  are  more  natural  than  diagnostic  judg¬ 
ments  may  come  from  the  fact  that  In  statistical  applications  P( data (hypothesis) 
typically  is  obtained  directly  from  a  so-called  statistical  model,  like  the 
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assumption  that  a  set  of  observations  is  normally  distributed  within  given 
parameters  (Edwards  et  al.,  1968).  This  asymmetry  also  underlies  the  celebrated 
urn  model.  However,  it  is  not  at  all  clear  whether  any  bias  in  favor  of  causal 
schema  exists  in  cases  where  a  parametric  statistical  model  is  not  obvious  and 
where  both  P(data| hypothesis)  and  P(hypothesis|data)  are  inferred  by  accessing 
semantic  memory  about  everyday  experiences. 

Tversky  and  Kahneman  (1977)  indeed  detected  what  they  called  'causal  biases' 
in  decision  making.  They  showed  that  subjects  perceive  causal  information  to 
have  a  greater  impact  than  diagnostic  information  of  equal  informativeness. 
Further,  if  some  information  has  both  causal  and  diagnostic  implications,  then 
subjects'  judgments  are  'dominated'  by  the  causal  rather  than  the  diagnostic 
relationship.  Granted  that  causal  reasoning  is  more  emphasized  in  ordinary 
inference  tasks,  the  question  of  the  conditions  under  which  the  causal  mode 
of  reasoning  will  lead  to  more  valid  inferences  still  remains. 

Aside  from  its  psychological  Interest,  this  question  has  also  acquired 
technological  import.  One  application  has  already  been  alluded  to,  that  of 
guiding  the  procedures  used  by  decision  analysts  in  eliciting  likelihood  esti¬ 
mates.  The  second  application  concerns  organization  of  knowledge-based  computer 
expert  systems  (Feigenbaum,  1977).  In  this  latter  application  judgments  from 
experts  are  encoded  in  the  form  of  heuristic  rules  which  are  later  combined  to 
yield  expert-like  conclusions,'  explanations,  and  interpretations.  The  appro¬ 
priate  format  for  these  fragmentary  judgments  is  still  subject  to  debate.  Some 
knowledge-based  systems  (e.g.,  Shortliffe's  MYCIN,  1976)  insist  on  diagnostic 
inputs.  Others  (e.g.,  Ben-Bassat's  MEDAS,  1980)  require  the  more  traditional 
causal  judgments.  The  issue  is  whether  experts,  such  as  physicians,  find  it 
more  comfortable  to  estimate  the  likelihood  of  diseases  given  a  set  of  symptoms 
or  to  evaluate  the  likelihood  that  a  given  disease  be  accompanied  by  a  certain 
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set  of  symptoms.  Comfort  aside,  which  form  of  input  yields  more  valid  thera¬ 
peutical  reconmendations? 

The  experiment  reported  in  this  paper  was  designed  to  shed  light  on  some 
of  these  issues.  The  problem  of  testing  judgment  validity,  which  has  long 
been  exacerbated  by  the  lack  of  suitable  criteria  for  measuring  the  quality  of 
judgments  about  real-life  experiences,  was  circumvented  by  'creating'  our  own 
ground- truth  data. 


Method 


Subjects 

One  hundred-seven  undergraduate  engineering  students  and  58  graduate 
students  from  various  departments  at  UCLA  participated  in  this  study.  The 
undergraduates,  who  were  enrolled  in  one  of  two  upper  level  undergraduate 
engineering  classes,  served  in  the  experiment  as  part  of  an  in-class  lecture. 

The  graduate  students  were  recruited  via  advertisements  posted  around  the 
campus  and  in  the  campus  newspaper,  and  they  were  paid  according  to  the 
accuracy  of  their  judgments. 

Materials 

The  undergraduates  participated  in  the  first  phase  of  the  study.  Their 
task  was  to  answer  24  yes/no  questions  concerning  their  activities  and  beliefs; 
the  answers  provided  the  data  base  (ground  truth)  for  the  estimation  phase  which 
followed.  The  questions  were  of  two  types:  'X  questions'  and  'Y  questions', 
equal  in  number  and  randomly  ordered  on  the  questionnaire.  Each  X  query 
questioned  a  condition,  activity,  or  belief  considered  by  the  experimenters 
to  be  a  causal  agent  for  a  condition,  activity,  or  belief  specified  in  one  Y 
query.  For  example,  since  the  color  of  a  person's  eyes  is  perceived  to  be 
influenced  by  that  person's  parents,  not  vice  versa,  X  may  represent  the  event 
of  a  mother  having  blue  eyes  and  Y  may  denote  the  condition  of  her  daughter  hav¬ 
ing  blue  eyes.  Four  categories  of  causal  relations  were  employed:  (1)  genetic 
causality,  where  a  genetic  condition  specified  by  the  X  question  serves  as  a 
cause  of  the  condition  designated  by  the  Y  question;  (2)  training  causality, 
where  the  X  condition  provides  training  for  the  Y  activity;  (3)  habit-forming 
causality,  where  the  X  condition  serves  as  a  habit-forming  agent  for  the 
behavior  specified  by  the  Y  condition,  and  finally;  (4)  self-interest  causal¬ 
ity,  where  the  X  question  defines  a  particular  self-interest  that  leads  to 
the  belief  unveiled  by  the  Y  question.  Table  1  shows  the  four  causal  categories 


6 


«r 


and  the  corresponding  X  and  Y  questions  for  each  category. 


Insert  Table  1  about  here 


The  data  compiled  from  the  undergraduates'  responses  served  as  the  estimation 
targets  in  the  second  phase  of  the  experiment.  In  this  phase,  the  graduate 
students'  task  was  to  estimate  the  proportion  of  undergraduates  responding  in 
particular  ways  on  the  questionnaire.  For  a  given  X-Y  relation,  each  graduate 
student  was  instructed  to  estimate  either  a  causal  triplet  or  a  diagnostic 
triplet.  When  estimating  the  causal  triplet,  the  estimator  first  considered 
P(X)  (e.g.,  the  proportion  of  undergraduates  who  said  their  mother  had  blue 
eyes),  then  P(Y|X)  (e.g.,  the  proportion  of  those  undergraduates  who  said 
their  mother  had  blue  eyes  and  also  said  they  themselves  have  blue  eyes),  and  then 
P(Y|X)  (e.g.,  the  proportion  of  those  undergraduates  who  said  their  mother  did 
not  have  blue  eyes,  but  said  they  themselves  have  blue  eyes).  In  assessing  a 
diagnostic  triplet,  the  student  first  estimated  P(Y)  (e.g.,  the  proportion  of 
undergraduates  who  said  they  have  blue  eyes),  then  P(X|Y)  (e.g.,  the  proportion 
of  those  undergraduates  who  said  they  have  blue  eyes,  and  also  said  their 
mother  had  blue  eyes),  then  P(X|Y)  (e.g.,  the  proportion  of  those  undergraduates 
who  said  they  do  not  have  blue  eyes,  but  said  their  mother  had  blue  eyes).  Note 
that  the  three  components  of  each  triplet  represent  statistically  Independent 
quantities  and,  moreover,  that  every  component  can  be  deduced  from  the  three 
members  of  the  opposing  triplet  via  Bayes'  theorem. 

Procedure 

The  undergraduates  answered  the  questionnaire  during  a  regularly  scheduled 
class  meeting.  The  graduate  students  were  assembled  in  groups  ranging  in  size 
from  4  to  15  persons.  Before  they  began  the  task,  the  graduate  students  were 
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told  about  the  nature  of  the  estifua cions  they  would  be  making,  and  about 
the  'pay  s^aie'  which  was  dependent  on  the  proximity  of  their  estimates  to  the 
actual  proportions  computed  from  the  undergraduates'  responses.  Half  of  the 
graduate  students  estimated  causal  triplets  for  odd-numbered  X-Y  relations  and 
diagnostic  triplets  for  even-numbered  relations.  For  the  other  half  of  the 
subjects,  this  pattern  was  reversed. 

Each  graduate  student  estimated  one  triplet  for  each  of  the  12  relations, 
thus  making  a  total  of  36  probability  estimates.  The  estimators  were  given  as 
much  time  as  needed  to  contemplate  the  estimates  required.  Most  of  the  students 
took  between  20  and  30  minutes  to  complete  the  task. 

Results  and  Discussion 

The  task  of  evaluating  judgment  validity  requires  a  choice  of  a  validity 
criterion.  A  variety  of  criteria  has  been  proposed  and  utilized  for  measuring 
the  degree  of  disparity  between  a  given  actual  proportion  P3  and  an  estimate 

a 

P  of  that  proportion  (Pearl,  1977).  We  have  examined  both  the  quadratic  error: 

Q  «  (Pe  -  Pa)2  (i) 

and  the  logarithmic  error: 

L  =  Pa  log  Pa/Pe  +  (1-Pa)  log  (1-Pa)/(1-Pe)  . 

Both  gave  rise  to  practically  identical  patterns,  so  this  paper  will  present 
data  based  on  the  quadratic  error  only. 

For  each  query  we  took  (J,  the  arithmetic  mean  of  the  quadratic  errors 
across  subjects,  as  a  measure  of  inaccuracy  of  the  corresponding  estimate. 

These  mean  quadratic  errors,  along  with  the  actual  proportions  and  mean  esti¬ 
mates,  are  shown  in  Table  2.  These  estimates  are  called  direct  estimates  to 
distinguish  them  from  synthetic  estimates,  which  will  be  discussed  later. 
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Insert  Table  2  about  here 


Table  2  reflects  a  slight  trend  for  the  mean  estimates  to  regress  toward 
the  .50  probability  level  in  relation  to  the  actual  probability.  That  is,  in 
65%  of  the  cases,  the  proportions  were  actually  'more  extreme’  (closer  to  .00 
or  1.00)  than  their  associated  estimates.  This  effect  is  more  apparent  in 
Figure  1,  which  displays  the  relationship  between  the  actual  proportions 
(along  the  horizontal  axis)  and  their  associated  estimates  (along  the  vertical 
axis) . 


Insert  Figure  1  about  here 


By  and  large,  one  cannot  detect  a  marked  difference  in  accuracy  between 
causal  estimates  (i.e.,  Pg(Y|X)  and  Pe(Y|XT))  and  their  diagnostic  counterparts 
(i.e.,  Pg ( X j Y )  and  Pe(X|Y)).  In  Figure  1,  for  example,  where  accuracy  is 
reflected  by  proximity  to  the  diagonal  line,  the  two  families  of  estimates 
appear  equally  dispersed.  However,  such  a  comparison  is  not  entirely  reliable. 
Since  the  values  of  the  actual  proportions  Pa(Y[X)  are  generally  smaller 
than  those  of  P_(X|Y),  a  direct  comparison  between  their  estimates  may  not 

Q 

reflect  true  differences  in  validity.  An  estimation  error  in  the  neighborhood 
of  P  =  .50  is  far  less  severe  than  an  error  of  equal  magnitude  near  the  extremes 
(.00  and  1.00).  On  four  of  the  X-Y  relations  the  actual  proportions  Pa(Y|X) 
and  Pa(X|Y)  are  fairly  close  to  one  another  (within  .15).  In  all  four  cases 
Pg(Yjx)  Is  at  least  slightly  more  accurate  than  Pe(XjY),  lending  some  support 
to  the  hypothesis  that  causal  reasoning  leads  to  better  inference-making  than 
diagnostic  reasoning.  However,  if  the  same  procedure  is  employed  with  the 


9 


Pe(Y|X)  estimate  (invoking  causal  reasoning)  and  the  Pe(X|7)  estimate  (based  on 
diagnostic  reasoning),  only  three  of  the  seven  comparisons  show  an  advantage 
for  Pe(Y|X).  Since  the  difference  between  causal  and  diagnostic  reasoning 
in  these  11  comparisons  is  generally  of  small  magnitude,  there  is  not  a 
noticeable  advantage  for  the  former,  as  had  been  anticipated. 

Another  way  to  circumvent  the  ‘apples  versus  oranges'  difficulty  is  to 
synthesize  causal  and  diagnostic  estimates  that  can  be  compared  on  equal 
ground.  To  do  this  we  aggregated  subjects'  estimates  by  Bayes'  theorem  to 
calculate  synthetic  estimates  according  to  the  following  equations: 


PS(X)  =  Pfi(X|Y)  Pe(Y)  +  Pe(X|Y)  [l-Pg(Y)]  (2) 

PS(Y)  =  Pg(Y | X)  Pe(X)  +  Pe(Y|X)  [l-Pe(X)]  (3) 


Pg( X | Y )  Pe(Y) 


PS(Y|X)  =  - 

Pp(Y | X)  P  (X) 

ps(X|Y) 

PS(Y|X)  = 


[l-Pe(X|Y)]  Pe(Y) 


[l-Pg(X|Y)3  Pe(Y)  +  Cl-Pe(X|Y)]  [l-Pe(Y)3 


PS(X|Y) 


[l-Pe(YlX)3  Pe(X) 


[1-P  (Y|X)J  P  (X)  +  [1-P  (Y| X)]  [l-P(X)] 


(4) 

(5) 

(6) 

(7) 


Note  that  the  synthetic  estimates  in  (2),  (4),  and  (6)  should  be  regarded  as 
diagnostic  since  they  are  deduced  from  diagnostic  inputs.  Similarly,  the 
estimates  constructed  by  formulas  (3),  (5),  and  (7)  are  causal. 

Furthermore,  the  synthetic  estimates  are  more  reflective  of  the  transforma¬ 
tions  employed  by  common  Decision  Analysis  procedures.  For  example,  formula  (5) 
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represents  the  celebrated  transformation  from  prior  to  posterior  which  was 
pioneered  (posthumously)  by  Reverend  Bayes  In  1761  as  a  means  to  infer  the 
"probability  of  causes".  It  has  since  become  almost  a  ritual  to  assume  that 
this  transformation  automatically  produces  more  valid  judgments  than  the  direct 
estimate  Pg(X|Y). 

Table  3  shows  the  mean  quadratic  error,  ($",  for  both  the  direct  estimate 
and  the  synthetic  estimate  for  each  of  the  four  conditional  probabilities. 

The  direct  estimates  for  P(Y]X)  and  P(Y|Y)  Involve  causal  reasoning  and  the 
direct  estimates  for  P(X|Y)  and  P(X|Y)  Involve  diagnostic  reasoning,  while 
this  relationship  reverses  when  the  synthetic  estimates  are  considered. 


Insert  Table  3  about  here 


Also  shown  is  an  Indicator  called  the  normalized  error  difference  which  gives 
a  measure  of  the  significance  of  the  difference  between  the  direct  estimate  and 
the  synthetic  estimate.  It  was  computed  by  the  following  formula: 


normalized  error  difference 


^^di agnostic  ~  dausal^ 

fl - 7“ 2 - 

/acausal  diagnostic 


(8) 


where  diagnostic  and  ^causal  stand  for  the  mean  Ruaclratic  error  across  subjects 

2 

for  either  the  direct  or  synthetic  estimates,  as  appropriate,  and  a  represents 
the  variance  of  those  quadratic  errors.  One  property  of  this  normalized  error 
difference  Is  rather  obvious:  Its  value  is  made  Increasingly  positive  when  the 
validity  of  the  causal  estimate  becomes  significantly  greater  than  that  of  the 
diagnostic  estimate,  and  negative  when  the  reverse  Is  true.  Clearly,  since  the 
same  actual  proportion  applies  to  both  the  direct  estimate  and  the  synthetic 
estimate  for  a  particular  probability,  the  'apples  versus  oranges'  problem  Is 
eliminated. 


II 


Across  all  estimates,  there  are  nine  instances  where  the  normalized  error 
difference  is  significant  at  the  .05  level  according  to  a  standard  two-tailed 


t-distributlon.  In  six  of  these  cases,  it  is  the  causal  estimate  that  is 
better  than  the  diagnostic,  which  leaves  three  cases  in  which  the  diagnostic 
is  better.  Thus,  there  is  little  evidence  in  these  data  for  the  superiority 
of  causal  reasoning  over  diagnostic  reasoning.  In  fact,  only  one  of  the 
problems  (musical  instrument)  shows  a  positive  normalized  error  difference  for 
all  four  conditional  probabilities,  while  one  other  problem  (typing)  has  a 
negative  normalized  error  difference  for  all  four  conditionals. 

Aside  from  comparing  causal  and  diagnostic  estimates.  Table  3  also  enables 
us  to  compare  the  validities  of  direct  versus  synthetic  estimates.  A  suspicion 
that  the  latter  may  be  more  valid  than  the  former  could  be  based  on  the  argument 
that  each  synthetic  estimate  combines  tne  output  of  three  knowledge  sources.  If 
these  were  independent  mental  processes  in  the  sense  that  the  estimator  providing 
them  would  consult  different  data  or  invoke  different  procedures  for  their  pro¬ 
duction,  then  one  would  be  justified  in  hypothesizing  superiority  for  synthetic 
estimates  over  their  direct  counterparts .  Comparing  the  data,  one  finds  that  in 
five  of  the  nine  significant  cases,  synthetic  estimates  are  better  than  their 
direct  counterparts. 

Table  4  shows  the  mean  quadratic  errors  for  direct  estimates  and  synthetic 
estimates  with  questions  grouped  according  to  the  type  of  causality  Implied  in  the 
X-Y  relations.  These  were  obtained  by  averaging  the  quadratic  errors  over  the  X-Y 
relations  with  each  causal  category.  In  general,  genetic  relations  induce  slightly 
more  accurate  estimates  than  training  and  habit-forming  relations,  while  self- 
interest  relations  induce  the  worst  estimates  of  all.  This  pattern  Is  true  for 
both  direct  estimates  and  synthetic  estimates.  For  each  causality  category  the 
synthetic  estimates  are  more  valid  than  the  direct  estimates  of  P(X|Y),  and  less 
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valid  for  P(Y|X).  In  each  case,  the  More  valid  estimates  art  those  based  on 
causal  reasoning,  which  lends  support  to  the  conjecture  that  causal  reasoning  Is 
more  naturally  Invoked  In  Interpreting  common  observations.  However,  when 
considering  the  other  four  columns  (P(X),  P(Y),  P(Y|7),  P(X|7)),  the  pattern 
of  results  no  longer  reflects  causal  superiority. 


Insert  Table  4  about  here 


Conclusions 

Admittedly,  having  ourselves  adhered  to  the  belief  that  causal  reasoning 
Is  a  more  natural  mode  of  Inference-making,  we  were  somewhat  surprised  that  the 
results  do  not  show  a  stronger  validity  differential  In  this  direction.  Taking 
Table  3,  for  example,  the  overall  mean  of  the  normalized  error  difference  is 
equal  to  .25,  which  clearly  does  not  support  the  hypothesis  of  general  causal 
superiority.  In  the  few  X-Y  relations  where  significant  validity  differentials 
were  detected,  there  was  not  a  sizable  bias  favoring  the  causal  mode.  Thus,  if 
the  validity  of  judgments  produced  by  a  given  mode  of  reasoning  is  a  measure 
of  whether  that  mode  matches  the  format  of  human  semantic  memory,  then  neither 
the  causal  nor  diagnostic  schema  Is  a  more  universal  or  more  natural  format 
for  encoding  knowledge  about  cannon,  everyday  experiences.  It  appears  that 
semantic  memory  contains  both  causal  schema  and  diagnostic  schema.  Which  Is 
Invoked  for  a  particular  observational  relation  may  depend  on  the  nature  of 
the  relation,  the  anticipated  mode  of  usage,  and  the  level  of  training  or 
familiarity  of  the  observer. 

These  findings  imply  that  one  should  approach  the  'divide  and  conquer' 
ritual  with  caution;  not  every  division  leads  to  a  conquest,  even  when  the 
resultant  atoms  are  cast  In  causal  phraslngs.  Forced  transformations  from 
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diagnostic  to  causal  judgments  performed  at  the  expense  of  conceptual  simplicity 
may  lead  to  Inferences  of  lower  quality  than  direct,  'holistic'  judgments. 
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Table  1 


X  Questions 

Y  Questions 

Genetic  Causality 

1.  Mother  has  blue  eyes. 

Student  has  blue  eyes. 

2.  At  least  1  of  student's  parents 
is  left-handed. 

Student  is  left-handed. 

3.  Student  Is  a  male  over  5'9". 

Student  played  on  high  school 
basketball  team. 

Training  Causality 

4.  Student  took  musical  lessons  as 
a  child. 

Student  currently  plays  a  musical 
instrument. 

5.  Student  ran  or  jogged  regularly 
in  high  school. 

Student  currently  runs  or  jogs 
regularly. 

6.  Student  took  typing  in  high  school. 

Student  types  2  40  words/ml n  now. 

Habit-forming  Causality 

7.  Student  attended  church  regularly 
in  high  school. 

Student  attends  church  regularly 
now. 

8.  Student  Is  currently  married. 

Student  Is  wearing  a  wedding  ring. 

9.  Student's  father  was  'handy' 
around  home. 

Student  changes  his  own  oil  in  his 
car. 

Self-Interest  Causality 

10.  Student  finds  it  financially 

difficult  to  complete  his  college 
studies. 

Student  favors  UCLA  increasing 
financial  aid  at  expense  of  larger 
classes. 

11.  Student's  family  finds  medical 
expenses  to  constitute  a  substan¬ 
tial  burden. 

Student  favors  nationalized  medical 
care  plan. 

12.  Student  closely  follows  UCLA 
football . 

Student  favors  UCLA  building 
on-campus  football  stadium. 
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Errors  for  Direct  Versus  Synthetic  Estimates 


Mean  Quadratic  Errors  for  Different  Causal  Categories 


Dir.  *  Direct  Estimates 
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